-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
DOC: Document that str.match accepts a regular expression #61879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Similar to str.fullmatch and other methods
@@ -1374,7 +1374,7 @@ def match(self, pat: str, case: bool = True, flags: int = 0, na=lib.no_default): | |||
Parameters | |||
---------- | |||
pat : str | |||
Character sequence. | |||
Character sequence or regular expression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by regular expression do you mean a string that is interpreted as a regular expression or a compiled regular expression object?
to avoid confusion, if the former then no doc change probably needed, if the later the type hints in the signature would also need to be updated and some code changes required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think he meant a compiled regular expression, this is how we are trying to type it in the stubs.
I believe we should align all the docs, since it uses the functions of re
under the hood the functions below support re.Pattern
so compiled regular expression is also accepted at runtime.
If we look at the docs it seems like it is a bit unclear what regular expression
means because I would assume it is just a regular string in the for r"..."
.
So the question is should we allow for compiled regular expression as it is supported at runtime?
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.rsplit.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.match.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.findall.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extract.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.extractall.html
- https://pandas.pydata.org/docs/reference/api/pandas.Series.str.fullmatch.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the documentation is the official API. If the stubs have been updated to reflect the types that are accepted then this is the tail wagging the dog? If we update the documentation, then we also need to update the type annotations in the code as well as ensure that the behavior is tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point and you are correct, I think the confusion originally came from regular expression != compiled regex
.
But then I went into the stubs and it seems like we are testing for it:
pandas/pandas/tests/strings/test_find_replace.py
Lines 562 to 571 in c888af6
def test_replace_compiled_regex_mixed_object(): | |
pat = re.compile(r"BAD_*") | |
ser = Series( | |
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0] | |
) | |
result = Series(ser).str.replace(pat, "", regex=True) | |
expected = Series( | |
["a", np.nan, "b", np.nan, np.nan, "foo", None, np.nan, np.nan], dtype=object | |
) | |
tm.assert_series_equal(result, expected) |
So the question would be to clarify what do we mean by regular expression
, is it compiled or not, and so we can:
- clarify the docs
- update the stubs according to allow or not
re.Pattern[str]
Please let us know @simonjayhawkins.
Similar to str.fullmatch and other methods that accept regular expressions
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.